COMPRESSED SENSING: 
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Abstract. Compressed Sensing (CS) seeks to recover an unknown vector with N entries by making far fewer 
than N measurements; it posits that the number of compressed sensing measurements should be comparable to the 
information content of the vector, not simply N. CS combines the important task of compression directly with the 
measurement task. Since its introduction in 2004 there have been hundreds of manuscripts on CS, a large fraction of 
which develop algorithms to recover a signal from its compressed measurements. 

Because of the paradoxical nature of CS - exact reconstruction from seemingly undersampled measurements - 
it is crucial for acceptance of an algorithm that rigorous analyses verify the degree of undersampling the algorithm 
permits. The Restricted Isometry Property (RIP) has become the dominant tool used for the analysis in such cases. 

We present here an asymmetric form of RIP which gives tighter bounds than the usual symmetric one. We give 
the best known bounds on the RIP constants for matrices from the Gaussian ensemble. Our derivations illustrate 
the way in which the combinatorial nature of CS is controlled. Our quantitative bounds on the RIP allow precise 
statements as to how aggressively a signal can be undersampled, the essential question for practitioners. We also 
document the extent to which RIP gives precise information about the true performance limits of CS, by comparing 
with approaches from high-dimensional geometry. 
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1. Introduction. Consider the task of measuring an unknown vector x G by taking inner 
products with vectors of one's choosing. The obvious choice would be to ask for the inner product 
of x with respect to each of the N canonical unit vectors ej (the j th entry of ej being one and all 
others zero). But what if it is known a priori that x is /c-sparse - i.e. has only k < N nonzero 
entries? Can't one then do better? If the nonzero entries of x are indexed by the set K (x(j) ^ 
if j G K and x(j) = for j G K c \ then only k inner products are needed: those with the canonical 
unit vectors ej for j G K. However, what if K is unknown? Is it still possible to make fewer than 
N measurements of xl 

Questions of this form must have been around for millennia. Consider this puzzle: "A counterfeit 
coin is hidden in a batch of N otherwise similar coins; it is distinguished from the others by its slightly 
heavier weight. How many balance weighings are needed to find the counterfeit?" Abstractly, this 
concerns the special case where K is an unknown singleton and the nonzero value is nonnegative; 
the balance is abstractly the same as an inner product which gives weight +1 to the coefficients 
placed in the "right" pan and -1 to the coefficients placed in the "left" pan. Many people quickly 
find that roughly log(iV) measurements suffice to find the position and value of the nonzero, each 
time putting half the remaining coins in one pan, half in the other, and discarding from further 
consideration the coins that turn out on the light side. Lighthearted as puzzles can sometimes seem, 
they can lead to serious applications. 

During World War Two, efficient screening of large groups of soldiers for certain infections was 
based on the principle of group testing, in which blood from many soldiers is combined in a single 
tube and tested for presence of an infectious agent. If an infection is found, one studies that group 
and by dyadic subdivision eventually isolates the infecteds [26j [34] . 
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More advanced mathematics can do much better than such common-sense ideas. Those with a 
physical bent may quickly see that, if N is prime, again assuming a singleton K and a nonnegative 
x, it will be enough, in fact, to make only 2 inner products, with respectively a sine and cosine 
of frequency 2tt/N; the phase of the corresponding complex Fourier coefficient immediately reveals 
the position of the nonzero. Note here that, for large TV, we are doing dramatically better than 
common- sense (2 measurements rather than log (AT)). 

Advanced mathematics is better than the common-sense approach in another way: common- 
sense uses adaptive measurements, where the next measurement vector is selected after viewing all 
previous measurements. In the advanced approach, adaptivity is unnecessary: one simply makes 2 
measurements defined a priori and later combines the two to reconstruct. 

Compressed Sensing (CS) embodies the advanced approach: it designs a special matrix A of size 
nxN, measures x via y = Ax, giving n measurements of the A" vector x in parallel, and reconstructs 
x from (y, A) using computationally efficient and stable algorithms. The key point is that n can be 
taken much smaller than A/", and much closer to k. For example, if x is known to be /c-sparse and 
nonnegative n = 2k + 1 suffices [21; and if x is only known to be /c-sparse, roughly n = 2 \og(N/n) • k 
will suffice, if k/N is small [22]. 

Since the release of the seminal CS papers in 2004, [10] [8j [17], a great deal of excitement has 
been generated in signal processing and applied mathematics research, with hundreds of papers on 
the theory, applications, and extensions of compressed sensing (more than 400 of these are collected 
at Rice's online Compressive Sensing Resources archive dsp . rice . edu/cs). Many applications have 
been proposed, including magnetic resonance imaging j40j [41] , radar [45], and single-pixel cameras 
[28] to name a few. In the MRI applications, it has been reported that diagnostic quality images 
can be obtained in 1/7 the recording time using CS approaches, [39 . For a recent review of CS see 
the special issue containing [28, 40 and for a review of sparse approximation see [5]. 

In CS the matrix A and reconstruction algorithm are referred to as an encoder/decoder pair and 
much of the research has focused on their construction; that is, how should the measurement matrix 
A be selected and what are the most computationally efficient and robust algorithms for recovering x 
given y and Al The two most prevalent encoders in the literature construct A by drawing its entries 
independently and identically from a Gaussian normal distribution, or by randomly sampling its 
rows without replacement from amongst the rows of a Fourier matrix. These enconders are popular 
as they are amenable to analysis, and they can be viewed as models of matrices with mean-zero 
entries and fast matrix-vector products, respectively. The most widely-studied decoder has been 
^-minimization, 

(1.1) min ||z||i subject to Az = y, 

zeM N 

which is the convex relaxation of the computationally intractable decoder, [42], seeking the sparsest 
solution in agreement with the measurements 

(1.2) min ||z||o subject to Az = y. 

zeR N 

Following the usual convention in the CS community, ||z||o counts the number of nonzero entries in 
z. Many other encoder /decoder pairs are also being actively studied, with new alternatives being 
proposed regularly; see Section [3] 

Here we do not review these exciting activities, but focus our attention on how to interpret 
the existing theoretical guarantees; in particular, we believe an important task for theory is to 
correctly predict the triples (k : n : N) for which a given encoder /decoder will successfully recover the 
measured signal, or a suitable approximation thereof. To exemplify this, we restrict our attention to 
a now-standard encoder/decoder pair: A Gaussian and ^-minimization. This pair offers the cleanest 
mathematical structure, giving us the chance to make the strongest and clearest statements which 
can be made at this time, for example by drawing on the existing wealth of knowledge in random 
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matrix theory and high-dimensional convex geometry. In this paper we focus almost exclusively on 
the most widely used tool for analyzing the performance of encoder/decoder pairs, the Restricted 
Isometry Property (RIP) introduced by Candes and Tao [TT] . 

Definition 1.1 (Restricted Isometry Property). A matrix A of size n x N is said to satisfy 
the RIP with RIP constant R(k, n, N; A) if for every x G X N "(*) := {x eR N : \\x\\ < k}, 

(1.3) R(k,n,N;A) := min c subject to (1 - c)\\x\\ 2 2 < \\Ax\\% < (1 + c)\\x\\ 2 2 . 

c>0 

As suggested by the name, the RIP constants measure how much the matrix A acts like an isometry 
when "restricted" to k columns; it describes the most significant distortions of the £ 2 norm of any 
^-sparse vector. Typically, R(k, n, TV; A) is measured for matrices with unit ^ 2 -norm columns, and 
in this special case R(l,n,N) = 0. Specifically, the RIP constant R(k,n,N; A) is the maximum 
distance from 1 of all the eigenvalues of the (^) submatrices, A^Ak, derived from A, where K is 
an index set of cardinality k which restricts A to those columns indexed by K. 

It is important to note that the RIP is predominently used to establish theoretical performance 
guarantees when either the measurement vector y is corrupted with noise or the vector x is not strictly 
fc-sparse. Proving that an algorithm is stable to noisy measurements is essential for applications 
since measurements are rarely free from noise. In this paper, we focus on the ideal noiseless case 
with the hopes of investigating the best possible theoretical results. For the noisy case, see pQ for 
^-minimization for q G (0, 1] and [3] for greedy algorithms. 

For many CS encoder/decoder pairs it has been shown that if the RIP constants for the encoder 
remain bounded as n and N increase with n/N — » 5 G (0, 1), then the decoder can be guaranteed 
to recover the sparsest x for k up to a critical threshold, which can be expressed as a fraction of n, 
p(S) • n. Typically each encoder /decoder pair has a different p(S). Little is generally known about 
the magnitude of p(S) for encoder /decoder pairs, making it difficult for a practitioner to know how 
aggressively they may undersample, or which decoder has stronger performance guarantees. (For 
a recent review of compressed sensing algorithms, including which have p(S) > 0, see [43[ Section 
7].) In this paper, we endeavor to be as precise as possible about the value of the RIP constants for 
the Gaussian ensemble, and show how this gives quantitative values for p(5) for the ^-minimization 
decoder. Similar results for other decoders are available in |3]. 

To quantify the sparsity/undersampling trade off, we adopt a proportional- growth asymptotic, 
in which we consider sequences of triples (fc, n, N) where all elements grow large in a coordinated 
way, n ~ 5 N and k ~ pn for some constants 8,p > 0. This defines a two-dimensional phase space 
(£, p) in [0, l] 2 for asymptotic analysis. 

Definition 1.2 (Proportional-Growth Asymptotic). A sequence of problem sizes (k,n,N) is 
said to grow proportionally if for (5, p) G [0, 1] 2 ; — >• S and ^ — » p as n — » oo. 

Ultimately, we want to determine, as precisely as possible, which subset of this phase space 
corresponds to successful recovery and which subset corresponds to unsuccessful recovery. This 
is the phase-transition framework advocated by Donoho et al [16l UHl [2Ql 1211 El] ; see Section [3] 
for a precise definition. By translating the sufficient RIP conditions into the proportional-growth 
asymptotic, we find lower bounds on the phase-transition for (5, p) in [0, l] 2 . An answer to this 
question plays the role of an under sampling theorem: to what degree can we undersample a signal 
and still be able to reconstruct it? 

The central aims of this paper are: 

• to shed some light on the behavior of the RIP constants of a matrix ensemble with as much 
precision as possible; 

• to advocate a unifying framework for the comparison of theoretical CS results by showing 
the reader how to interpret and compare some of the existing recovery guarantees for the 
prevalent i 1 decoder; 

• to introduce a reader new to this topic to the type of large deviation analysis calculations 
often encountered in CS and applicable to many areas faced with combinatorial challenges. 
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In pursuit of these goals, we sharpen the use of the RIP and squeeze the most out of it, quantifying 
what can currently be said in the proportional-growth asymptotic and thereby making precise the 
undersampling theorems the RIP implies. We proceed in Section [2] along two main avenues. First, 
we concentrate on Gaussian matrices, using bounds on their singular values we develop the sharpest 
known bounds on their RIP constants; in fact, these are the the best known bounds of any class 
of matrices in the proportional-growth asymptotic with n < N . Second, we use an asymmetric 
definition of the RIP where the lower and upper eigenvalues are treated separately, and in doing so 
further improve the conditions in which the RIP implies CS decoders recover the measured signal. 
In Section |3] we combine these two improvements to exhibit a region of the (5, p) phase space where 



RIP analysis shows that undersampling will be successful for the ^-minimization decoder (1.1). 

The RIP is not the only tool used to analyze the performance of CS decoders. The different 
methods of analysis lead to results that are rather difficult to compare. In Section pO] we describe 
in the proportional-growth asymptotic, with A Gaussian and the ^-minimization decoder, two al- 
ternative methods bounding the phase transition: the polytope analysis [16] [TSJ [22] of Donoho and 
Tanner and the geometric functional analysis techniques of Rudelson and Vershynin [46] . By trans- 
lating these two methods of analysis and the RIP analysis into the proportional-growth asymptotic, 
we can readily compare the results obtained by these three techniques by comparing the regions of 
the (£, p) phase space where each method of analysis has guaranteed successful recovery. In particu- 
lar, we find that for the Gaussian encoder, the RIP, despite its popularity, is currently dramatically 
weaker than the other two approaches in the strength of conclusions that it can offer. However, this 
limitation is counterbalanced by RIP being successfully applied to a broad class of encoder/decoder 
pairs, and seemlessly also proving stability to noisy measurements and compressible signals. 

We conclude with a discussion of some other important and related topics not addressed in the 
current paper. We briefly discuss comparisons of results when noise is present in the measurements 
or the signal x is not perfectly /c-sparse, average case analysis versus the theoretical worst case 
analysis presented here, and the potential to improve the phase transition curves through improved 
analysis or improved bounds. 

2. Bounds on RIP for Gaussian Random Matrices. Let K C {1,...,7V} be an index 
set of cardinality k which specifies the columns of A chosen for a submatrix, i^, of size n x k. 
Explicitly computing R(k, n, N; A) would require enumerating all (^) subsets K of the columns of 
A, forming each matrix Gk — ^k^k^ and calculating their largest and smallest eigenvalues. We 
have never seen this done except for small sizes of N and so not much is known about the RIP 
constants of deterministic matrices. Fortunately, analysis can penetrate where computation becomes 
intractable. Associated with a random matrix ensemble is an, as of yet unknown, probability density 
function for R(k, n, N). Let us focus on the Gaussian ensemble where much is already known about 
its eigenvalues. We say that annxiV random matrix A is drawn from the Gaussian ensemble of 
random matrices if the entries are sampled independently and identically from the standard normal 
distribution, A/"(0, n _1 ). (The n _1 scaling in the Gaussian ensemble cause the I 2 norm of its columns 
to have expectation 1.) We say that a kxk matrix W Uj k is a Wishart matrix if it is the Gram matrix 
X T X of an n x k matrix X from the Gaussian ensemble. The largest and smallest eigenvalues of a 
Wishart matrix are random variables, denoted here A™^ = X max (W n ^) and A.™™ = A mm (H / n5 / c ). 
These random variables tend to defined limits, in expectation, as n and k increase in a proportional 
manner. With | -> p as n 00, we have £ (A™ a k x ) (1 + y/p) 2 and £(A™ n ) -*►(!- ^/p) 2 ; 



U, see Figure 2.1 Explicit formulas bounding A™^ and A.™™ are available [30]. An empirical 



2.2 



approximation of the probability density functions of A™^ and A™^ 71 is shown in Figure 

The asymmetric way that the expected eigenvalues A™^ and A^ n deviate from 1 suggests that 
the symmetric treatment used by the traditional RIP is missing an important part of the picture. 
We generalize the RIP to an asymmetric form and derive the sharpest recovery conditions implied 
by the RIP. 

Definition 2.1 (Asymmetric Restricted Isometry Property). For a matrix A of size n x N , 
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0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0. 



p = k/n 

Fig. 2.1. Expected values of the largest and smallest eigenvalues of a Wishart matrix W n ^ with p = ^. Note 
the asymmetry with respect to 1. 




Fig. 2.2. Empirical Distributions of the Largest and Smallest Eigenvalues of a Wishart Matrix. A 

collection of frequency histograms of A™ x and AJ^ n ; x-axis - size of the eigenvalue; y-axis - number of occurrences; 

z-axis - ratio p = ^ of the Wishart parameters. Overlays: curves depicting the expected values (1 ± yfp) 2 of 
and A™ l fc n . Here n = 200 
larger n, the concentration would be tighter. 



At this value of n it is evident that A™ x 



and AJ^ n lie near, but not on curves. For 



the asymmetric RIP constants L(k,n, N; A) and U(k,n,N;A) are defined as: 

(2.1) L(k,n,N; A) := min c subject to (1 - c)\\x\\ 2 2 < \\ Ax\\ 2 2 , for all x e X^O); 

c>0 

(2.2) U(k, n, N; A) := min c subject to (1 + c)\\x\\l > \\Ax\\l , for all x e X N (k). 

c>0 

(A similar change in the definition of the RIP constants was used independently by Foucart and 
Lai in [32], motivated by different concerns.) 

Remark 1. Although both the smallest and largest singular values of AJ^Ak effect the stability 
of the reconstruction algorithms, the smaller eigenvalue is dominant for compressed sensing in that 
it allows distinguishing between sparse vectors from their measurement by A. In fact, it is often 
incorrectly stated that R(2k, n,N) < 1 is a necessary condition to ensure that there are no two k- 
sparse vectors, say x and x' , with the same measurements Ax = Ax' ; the actual necessary condition 
is L(2k,n,N) < 1. 



We see from (2.1) and (2.2) that (1 - L(k, n, N)) = mm K \ min (G K ) and (1 + U(k, n, N)) 
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maxK \ max (G k) with Gr — ^rAk- A standard large deviation analysis of bounds on the prob- 
ability density functions of A™^ and A™]™ allows us to establish upper bounds of L{k, n, N) and 
U(k,n,N) which are exponentially unlikely to be exceeded. 

Definition 2.2 (Asymptotic RIP Bounds). Let A be a matrix of size n x N drawn from 
the Gaussian ensemble and consider the proportional- growth asymptotic (j^ —> S and ^ — >• p as 
n — » oo). Let H(p) := p\og(l/p) + (1 — p)log(l/(l — p)) denote the usual Shannon Entropy with 
base e logarithms, and let 

(2.3) VW (A, p) := H{p) + ^ [(1 - p) log A + 1 - p + p log p - A] , 

(2.4) Vw(A,p) := ^[(l + p)logA + l + p-plogp-A]. 



Define \ m%n (5,p) and \ max (5,p) as the solution to (2J3) and respectively: 
(2.5) ^n(A mm ((5,p),p)+i7(p5) =0 for \™ n (5,p) < 1-p 



(2.6) #ma,(A m ^((5,p),p) + i?(p(5) = /or A ma *(M > 1 + p. 
Define C{5, p) and U(5, p) as 

(2.7) £((5, p) := 1 - A min (£, p) and W(J, p) := min A ma *(£, i/) - 1. 

^e[p,i] 



To facilitate ease of calculating £(5, p) and U(5, p), web forms for their calculation are available 
at ecos . maths . ed . ac . uk. 

In the proportional growth asymptotic, the probability that C{5, p) and U(5,p) bound the ran- 
dom variables L(k,n,N) and U(k,n, iV), respectively, tends to 1 as n — » oo. In statistical termi- 
nology, the coverage probability of the upper confidence bounds C(5, p) and U(5, p) tends to one as 
n —> oo. In fact, all probabilities presented in this manuscript converge to their limit "exponentially 
in n" ; that is, the probability for finite n approaches its limit as n grows with discrepancy bounded 
by a constant multiple of e~ n ^ for some fixed (3 > 0. 

Theorem 2.3 (Validity of RIP Bounds). Fix e > 0. Under the proportional- growth asymptotic, 
Definitional^ sample each n x N matrix A from the Gaussian ensemble. Then 

Prob (L(fc, n, N;A) < £ (5, p) + e) 1 and Prob (U(k, n, N;A) <U (5, p) + e) 1 



exponentially in n. 

Remark 2. Extensive empirical estimates of L(k,n,N) and U(k,n,N) show that the bounds 
£(5, p) and U(5, p) are rather sharp; in fact, they are no more than twice the actual upper bounds on 
L(fe, n, N) and U (fc, n, N), see Figure 2.4 and Table \2.l\ and are much closer for the region applicable 
for GS decoders, p <C 1. The empirically observed lower bounds on L(k,n,N) and U(k,n,N) are 
calculated through the following process. The number of rows, n, is fixed at one of the values in 



Table 2.1, For each n, ^7 values of N are selected so that n/N ranges from 1/20 to 20/21. For 
each (n,N) a matrix A of size n x N is drawn from A/"(0,n _1 ) and either the algorithm from \2^j 
or is applied to determine support sets of size fc = l,2,...,n — 1 which are candidates for the 
support sets that maximize L(k, n, N; A) or U (fc, n, N; A) . The largest or smallest eigenvalue of each 
resulting nx k submatrix is calculated and recorded. The above process is repeated for some number 
of matrices, see the caption of Table \2.f\ and the maximum value recorded. The empirical calculation 
of RIP constants are lower bounds on the true RIP constants as the support sets calculated by f2j_ 
and J22J/ may not be the support sets which maximize the RIP constants. 
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Fig. 2.3. The RIP bounds of Eq. \2.7\ . Level sets of C(S, p) (left panel) and U(S, p) (right panel) over the 
phase space (S,p) £ [0,1] 2 . For large matrices from the Gaussian ensemble, it is overwhelmingly unlikely that the 
RIP constants L(k,n, N; A) and U(k,n, N; A) will be greater than these values. 




0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 



Fig. 2.4. Empirically observed lower estimates of RIP bounds of RIP constants. Although there is 
no computationally tractable method for calculating the RIP constants of a matrix, there are efficient algorithms 
which perform local searches for extremal eigenvalues of submatrices; allowing for observable lower bounds on the 
RIP constants. Algorithms for lower bounding L(k,n, N) , [21], and U(k,n, N), f38f , were applied to dozens of A 
drawn Gaussian A/"(0, n~ 1 ) with n = 400 and N increasing from 420 to 8000. Level sets of the observed L(k, n, N; A) 
(left panel) and U (k, n, N; A) (right panel). 



max 



200 
400 



L(k,n,N) 

1.22 
T32 



max 



U(k,n,N) 

1.83 
1.81 



Table 2.1 

The maximum ratio of the RIP bounds in Theorem \2.3\ to empirically observed values. For each of the ratios 
n/N tested, multiple matrices were drawn and empirical low bounds on their RIP constants calculated. For n = 200 
between 9 and 175 matrices were drawn for each n/N, and for n = 400 between 7 and 489 matrices were drawn for 
each n/N. Our bounds are numerically found to be within a multiple of 1.83 of empirically observed lower bounds. 



2.1. Proof of Theorem |2.3[ In order to prove Theorem |2.3[ this section employs a type of 
large deviation technique often encountered in CS and applicable in fact, to many areas faced with 
combinatorial challenges. 

We first establish some useful lemmas concerning the extreme eigenvalues of Wishart matrices. 
The matrix A generates (^) different Wishart matrices Gk = A^Ak- Exponential bounds on the 
tail probabilities of the largest and smallest eigenvalues of such Wishart matrices can be combined 
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with exponential bounds on (^) to control the chance of large deviations using the union bound. 
This large deviation analysis technique is characteristic of proofs in compressed sensing. By using 
the exact probability density functions on the tail behavior of the extreme eigenvalues of Wishart 
matrices the overestimation of the union bound is dramatically reduced. We focus on the slightly 
more technical results for the bound on the most extreme of the largest eigenvalues, U(5,p), and 
prove these statements in full detail. Corresponding results for C(5, p) are stated with their similar 
proofs omitted. 

The probability density function, f ma x(k, n; A), for the largest eigenvalue of the k x k Wishart 
matrix A^Ak was determined by Edelman in [29]. For our analysis, a simplified upper bound 
suffices. 

Lemma 2.4 (Lemma 4.2, pp. 550 [29 ). Let Ak be a matrix of size n x k whose entries are 
drawn i.i.d from A/"(0, n _1 ). Let f ma x(k, n\ A) denote the probability density function for the largest 
eigenvalue of the Wishart matrix A^Ak of size k x k. Then f ma x(k, n; A) satisfies: 

(2.8) fma X (k,n;\) < 



(2^) 1 / 2 (nA)- 3 / 2 



nX 

T 



(n+fc)/2 



ni 



r(f) 



i\/2 



c (fe,n; A). 



For our purposes, it is sufficient to have a precise characterization of g ma x(k, n\ A)'s exponential 
(with respect to n) behavior. 

Lemma 2.5. Let k/n = p g (0, 1) and define 

^max(\p) := g [(l + p)logA + l + p-plogp- A], 

Then 



(2.9) 



fmax(k, n; A) < Pmax(n, A) exp(n • VWr(A, p)) 



where Pmax(n, A) is a polynomial in n, A. 

Proof Let g m ax(k^n; X) be as defined in ( |2.8| ) and let p n = k/n. To extract the exponential 
behavior of g max (k,n; X) we write Mog(# max (/c, n; A)) = $i(/c, n; A) + $ 2 (/c, n; A) + $ 3 (/c, n; A) where 



1 3 

$i (fc, n; A) = — log (2tt) - — log (nX) 
In In 



<£>2(k,n;X) 



$ 3 (k,n;X) 



(1 + p n ) log 



(1) 



Clearly, lin^^oo <l>i(/c, n; A) = and can be subsumed as part of Pmaxin^X). To simplify $3, we 
apply the second of Binet's log gamma formulas [52j Sec. 12.32], namely log(r(z)) = (z — 1/2) logz — 
z + log + / where / is a convergent, improper integral. With c(n, p) representing the constant 
and integral from Binet's formula we then have 



$ 2 (k,n; A)+$ 3 (/c,n; A) 



(l + p n )logA 



IV 2 n n 

p n log p n + - log - + p„ 

n / n 2 



1-A 



1 



c(n,p n ) 



As liu^^oo n x c(n, p n ) = it can be absorbed into p ma x( n . A) and we have 



VWc(A,p):= lim - log [g ma x(k, n\ A)] = \ [(1 + p) log A - plogp + p + 1 - A] 

n— )*oo 77, Z 



and the conclusion follows. □ 
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To bound /7(/c,n, TV), we must simultaneously account for all (^) Wishart matrices A^Ak 
derived from A. Using a union bound this amounts to studying the exponential behavior of 
C^)g>max(k>, n\ A). In the proportional-growth asymptotic this can be determined by characteriz- 
ing liniAr^oo log (^) (fc,n; A) 



which from Lemma 



2.5 



lim — log 

AT^oo N 



5 (fc,n; A) 



= lim — log 



is given by 
" J im T? l °g[9rnax(n,k; A)] 



H 



N 



5 lim - log [gmax(n, k; A)] 

n— )>oo 77, 



(2.10) 



iZ"(p£) + 5lj)max{\ p) =' Slpu(S, Q\ A). 



Recall that H(p) := plog(l/p) + (1 — p)log(l/(l — p)) is the usual Shannon Entropy with base e 
logarithms. 

Equipped with Lemma [2^5] and ( |2.10| ), Proposition 2.6 establishes \ max (5, p) — 1 as an upper 
bound on U(k,n,N) in the proportional-growth asymptotic. 

Proposition 2.6. Let 5, p e (0, 1), and A be a matrix of size n x N whose entries are drawn 
i.i.d. from A/"(0,n _1 ). Define U(5,p) := \ max (5, p) — 1 where \ max (S,p) is the solution to (2.6). 
Then for any e > 0, in the proportional- growth asymptotic 



Prob 



(u(k,n,N) >U(5,p) + 6^j 



-> 



exponentially in n. 

Proof Throughout this proof 5 and p are fixed, and we focus our attention on A, often abbre- 
viating ipu(5,p]\) in (2.10) as ipu(X). We first verify that (2.6) has a unique solution. Since 

d i m 1 + ? 1 
dA^ (A) = 2 " 1 

ipuW is strictly decreasing on [1 + p, oo) and is strictly concave. Combined with 



+ =5- 1 H(p5) + 



(1 + p) log(l + p)+plog 



> 



and lim^oo ^[/(A) = — oo, there is a unique solution to (2.6), namely \ max (5, p). 

Select e > and let (k,n,N) be such that = 5 n \ ^ = p n . First, we write the probability 
statement in terms of \ max (5 n , p n ): 



Prob 



U{k, n, N) > U(5 n , p n ) + e)J = Prob [U{k, n, N) > \ max (5 n , p n ) - 1 

= Prob [1 + U(k, n, AT) > \ max (5 n , p n ) 

fmax(k,n;\)d\ 



■*)] 



(2.11) 



< 



A mas (5 n ,p n )+e 



; (fc, n; A)dA. 



To bound the integral in (2.11) in terms of gmax(5, p] \ max (Sn, Pn)) we write g n 
terms of n, p n , and A as gmax(k,n; A) = (/?(n, p n )\~ 2 At( 1 +^)e~ f A where 



3 (fc, n; A) in 



^(n,Pn) = (27r)^n 3 



n\ f (!+Pn) 



r(fpn)r(f)- 
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Since \ max (5 n , p n ) > l+p n , the quantity A 2 ( 1 +^) e 2 A is strictly decreasing in A on [\ max (5, p n ), oo). 
Therefore we have 



A maa: (5 Tl ,p Tl )+e 



(2.12) 



A rnaa3 (5 ri ,p ri )+e 



c (fc, n; A)eZA < ^(n, p n ) (X max (S n , p n ) + e )^ l+pn) e"* (A mM («n,P„)+e) 

= (A max (5„,p„)+e) f ff m a ;c (fc,n;A mra (^ )/ 9„) + e) /"* A" 
= 2 (A ma *(S n , p n ) + e) (fc, n; \ max (5 n , p n ) + e) . 



A~2^A 



2dA 



Therefore, combining (2.11) and (2.12) we obtain 
Prob U(k, n, N) > U(5 n , p n ) + e)l < 2 (A ma *(£ n , p n ) + e) 



TV 



9r> 



(k,n;\™ ax (5 n ,p n )+e) 



(2.13) 



< Pma, (n, A ma *(J n , p n )) exp [n • ^ (A ma *(J n , p n ) + e)] 

< Pmax (n, A maX (J n , p n )) exp ^ 



with the last inequality following from the strict concavity of ipu{X). Since j^ipu (A ma:E (5, p)) < is 
strictly bounded away from zero and lim^oo X max (S n , p n ) = \ max (5, p), we arrive at, for any e > 



lim Prob 

n— ^oo 



U(k,n,N) >U(5,p) + e) 



0. 



□ 

The term H(pS) in (2.10), from the union bound over all (^) matrices A^Ak, results in an 
overly pessimistic bound in the vicinity of pS = 1/2. As we are seeking the least upper bound 
on U(k,n,N) we note that any upper bound for U(j,n,N) for j > k is also an upper bound for 
U (fc, n, N), and replace the bound U(5, p) with the minimum of U(5, v) for v G [p, 1]. 

Proposition 2.7. Let J, p e (0, 1), and define U(5, p) := min^^] U(5, v) with U(5, v) defined 
as in Proposition 2.6 , For any e > 0, in the proportional- growth asymptotic 

Prob (U(k, n, N) > p) + e) -> 

exponentially in n. 

Proof. By the definition of x N ( k) in Definition |l.l[ U(j,n,N) > U(k J n J N) for j = fc + 1, fc + 



2, . . . , n; combined with Proposition 

Prob 



2.6 



for 



^ as n — >• oo 



(u(j,n,N) >«(*,i/) + e) 



-> 



exponentially in n, and taking a minimum over the compact set v G [p, 1] we arrive at the desired 
result. □ 

A similar approach leads to corresponding results for p). Edelman also determined the 
probability density function, fmin(k^n; A), for the smallest eigenvalue of the k x k Wishart matrix 
A^Ak [29]. Here again, a simplified upper bound suffices: 

Lemma 2.8 (Prop. 5.2, pp. 553 [29]). Let Ak be a matrix of size n x k whose entries 
are drawn i.i.d. from A/"(0,n _1 ). Let fmi n (k,n; A) denote the probability density function for the 
smallest eigenvalue of the Wishart matrix A^-Ak of size k x k. Then fmin(k,n; A) satisfies: 
(2.14) 



^n;A)<(^) 



1/2 



-n\/2 



(n-fe)/2 



r(=±i) 



r(§)r(=f±i)r(=f±*) 



(fe,n; A). 



Compressed Sensing: How sharp is the RIP? 



11 



With Lemma [2.8[ we establish a bound on the asymptotic behavior of the distribution of the 
smallest eigenvalue of Wishart matrix of size k x k. 
Lemma 2.9. Let k/n = p G (0, 1) and define 

^min{\p) '= H(p) + - [(1 - p)logA + 1 - p + plogp- A] . 

Then 

(2.15) fmin(k, n; A) < p m in(n, A) exp(n • VWi(A, p)) 

where p m in(n, A) is a polynomial in n, A. 

With Lemma [2T9| the large deviation analysis yields 



(2.16) lim -*-\og 

V 1 N^oo N 



N 



Qmin 

(fe,n; A) 



H(pS) +^min(A,p). 



Similar to the proof of Proposition |2.6| Lemma 2.9 and (2.16) are used to establish £(5, p) as an 



upper bound on L(fc, n, iV) in the proportional-growth asymptotic. 

Proposition 2.10. Let 5,p G (0, 1], and A be a matrix of size n x N whose entries are drawn 
i.i.d. fromM^.n- 1 ). Define C(6,p) := l-\ min (5,p) where \ min (5,p) is the solution to Then 
for any e > 0, in the proportional- growth asymptotic 

Prob n, N) > C{5, p) + e) 

exponentially in n. 

The bound C(5,p) is strictly increasing in p for any (5 G (0, 1), and as a consequence no tighter 



bound can be achieved by minimizing over matrices of larger size as was done in Proposition 2.7 



3. RIP Undersampling Theorems. The high level of interest in compressed sensing is due 
to the introduction of computationally efficient and stable algorithms which provably solve the 



seemingly intractable (1.2) even for k proportional to n. New compressed sensing decoders are 
being introduced regularly; broadly speaking, they fall into one of two categories: greedy algorithms 
and regularizations. Greedy algorithms are iterative, with each step selecting a locally optimal subset 
of entries in x which are adjusted to improve the desired error metric. Examples of greedy algorithms 
include Orthogonal Matching Pursuit (OMP) [50], Regularized OMP (ROMP) [44], Stagewise OMP 
(StOMP) [25 , Compressive Sampling MP (CoSaMP) [43], Subspace Pursuit (SP) [H], and Iterated 
Hard Thresholding (IHT) [4 . Regularization formulations for sparse approximation began with 
the relaxation of \1.2\ to the now ubiquitous (convex) i 1 -minimization [14!, ( |1.1[ ), and has since 
been extended to non-convex ^-minimization for q G (0, 1), [35j [32] [13] [12] [47] . Although general- 
purpose convex optimization solvers may be employed to solve I 1 -minimization ( |1.1[ ), highly-efficient 
software has been recently designed specifically for i 1 -minimization in the context of compressed 
sensing, see [l4 | [3T ] l5T ] 54 . Non-convex formulations have sometimes been able to offer substantial 
improvements, but at the cost of limited guarantees that the global minima can be found efficiently, 
so it remains unclear how practical they really are. 

As stated at the end of the introduction, one of the central aims of this article is to advocate 
a unifying framework for the comparison of results in compressed sensing. Currently there is no 
general agreement in the compressed sensing community on such a framework, making it difficult to 
compare results obtained by different methods of analysis or to identify when new results are im- 
provements over existing ones. Donoho has put forth the phase transition framework borrowed from 
the statistical mechanics literature and used successfully in a similar context by the combinatorial 
optimization community, see [36 , 37 . This framework has been successfully employed in compressed 



sensing by Donoho et al, [2Q]l2T| l24]. 
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Fortunately, every compressed sensing algorithm that has an optimal recovery order of n pro- 
portional to k can be cast in the phase transition framework of Donoho et al., parametrized by two 
inherent problem size parameter^} 

• the undersampling rate of measuring x through n inner products with the rows of A, as 
compared to directly sampling each element of x G ~R. N \ 

S n = n/N G (0, 1) 

• the overs ampling rate of making n measurements as opposed to the optimal oracle rate of 
making k measurements when the oracle knows the support of x: 

p n = k/n G (0,1). 

For each value of S n G (0, 1) there is a largest value of p n which guarantees successful recovery of x. 
We now formalize the phase transition framework described above. 

Definition 3.1 (Strong Equivalence). The event StrongEquiv( A, alg) denotes the following 
property of an n x N matrix A: for every k-sparse vector x, the algorithm "alg" exactly recovers 
x from the corresponding measurements y = Ax. 

For most compressed sensing algorithms and for a broad class of matrices, under the proportional- 
growth asymptotic there is a strictly positive function ps(S;ALG) > defining a region of the (5, p) 
phase space which ensures successful recovery of every /c-sparse vector x G X N (k)- This function, 
ps(S]ALG), is called the Strong phase transition function pTU| \W\ H8]. 

Definition 3.2 (Region of Strong Equivalence). Consider the proportional- growth asymptotic 
with parameters (5, p) G (0,1) x (0,1/2). Draw the corresponding n x N matrices A from the 
Gaussian ensemble and fix e > 0. Suppose that we are given a function ps(S ;ALG) with the property 
that, whenever < p < (1 — e)ps(S;ALG), Prob(StrongEquiv(A,ALG)) — >> 1 as n — ^ oo. We say that 
ps(S ;ALG) bounds a region of strong equivalence. 

Remark 3. The subscript S emphasizes that the phase transition function ps(S ;ALG) will define 
a region of the (5,p) phase space which guarantees that the event StrongEquiv ( A, alg) is satisfied 
with probability on the draw of A converging to one exponentially in n. This notation has been 
established in the literature by Donoho and Tanner flBj [2fl/ to distinguish strong equivalence (i.e. 
that every k-sparse vector x is successfully recovered) from weak equivalence (i.e. all but a small 
fraction of k-sparse vectors are successfully recovered). For example, fTU\, [21 )/ study the event where 
i 1 -minimization ( |1.1[ ) exactly recovers x from the corresponding measurements y = Ax, except for a 
fraction (1 — e) of the support sets. 

For the remainder of this section, we translate existing guarantees of Strong Equiv (A, i 1 ) into 
bounds on the region of strong equivalence in the proportional- growth asymptotic; we denote 
psiS',1 1 ) = ps(d)- A similar presentation of other CS decoders is available in [3 j. In order to 
make quantitative statements, the matrix or random matrix ense mble must first be specified, [2]; we 
again consider A drawn from the Gaussian ensemble^] In Section 3.1 we demonstrate how to incor- 



porate the RIP bounds from Section [2] into results obtained from an RIP analysis. In Section |X2| we 
compare bounds on the region of StrongEquiv(A 1 i 1 ) proven by three distinct methods of analysis: 
eigenvalue analysis and the RIP [32], geometric functional analysis [46], and convex polytopes [T6] . 

3.1. Region of Strong Equiv (A, i 1 ) implied by the RIP. In this section, we incorporate the 
bounds on RIP constants established in Section[2]into a known condition implying Strong Equiv(A, i 1 ) 
obtained from an RIP analysis. Following the pioneering work of Candes, Romberg, and Tao [8) ITT], 
many different conditions on the RIP constants have been developed which ensure recovery of every 



^-For some algorithms, such as ^-regularization, these two parameters fully characterize the behavior of the 
algorithm for a particular matrix ensemble, whereas for other algorithms, such as OMP, the distribution of the 
nonzero coefficients also influences the behavior of the method. 

2 Similar results have been proven for other random matrix ensembles, but they are even less precise than those 
for the Gaussian distribution. 
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^-sparse vector via ^-minimization, [6] [3 [9] [10] [46] to name a few. The current state of the art RIP 
conditions for i 1 -minimization were developed by Foucart and Lai [32] . 

Theorem 3.3 (Foucart and Lai [32]). For any matrix A of size n x N with RIP constants 
L(2k, n, N) and U(2k, n, N), for 2k < n < N . Define 



(3.1) 



p FL (k,n,N) := 



1 + V2 n+U(2k,n,N) 
4 \l-L(2k,n,N) 



1 



If p FL (k,n,N) < 1, then there is StrongEquiv(A, i 1 ) . 

To translate this result into the phase transition framework for matrices from the Gaussian 
ensemble, we employ the RIP bounds (2.7) to the asymmetric RIP constants L(2fc, n, N) and 
U (2k, n, N). It turns out that naively inserting these bounds into (3.1 ) yields a bound on p FL (k, n, N), 



see Lemma [XS] and provides a simple way to obtain a bound on the region of strong equivalence. 
Definition 3.4 (RIP Region of StrongEquiv(A, i 1 ) ). Define 



(3.2) 



p 



FL (M : = 



l + \/2 fl+U(5,2p) 
l-£(5,2p) 



3.1 



and p s (5) as the solution to \i (5,p) = 1. 

The function p FL (S) is displayed as the red curve in Figure 
Theorem 3.5. Fix e > 0. Consider the proportional- growth asymptotic, Definition \1.2\ with 
parameters (S,p) G (0, 1) x (0, 1/2). Draw the corresponding n x N matrices A from the Gaussian 
ensemble. If p < (1 — e)p FL (8), then Prob(StrongEquiv(A, i 1 )) 1 as n — >• oo. 

Therefore the function p FL ($) bounds a region of strong equivalence for i l -minimization. 
Theorem |3.5| fol lows from Theorem 3.3 an d the validity of the probabilistic bounds on the RIP 

bounds p FL (k, n, TV) in terms of the asymptotic 



constants, Theorem 2.3 In particular, Lemma 



3.6 



RIP bounds £(5,2p) and U(5,2p), by the quantity p FL (5, (1 + e)p) defined in (3.3). If p e (5) is the 
solution to p FL (5, (1 -\-e)p) = 1, then for p < p e (S) we ach ieve the desired bound, p FL (k,n,N) < 1, 
to ensure StrongEquiv(A 1 i 1 ). The statement of Theorem 3.5 follows from relating p e (S) to p FL (5), 
the solution to p FL (5,p) = 1. 

Lemma 3.6. Fix e > 0. Consider the proportional- growth asymptotic with parameters (5, p) G 
(0, 1) x (0, 1/2). Draw the corresponding n x TV matrices A from the Gaussian ensemble. Then 



Prob (p FL (k, n, N) < p FL (5, (1 + e)p)) -> 1 



(3.3) 

exponentially in n. 

Proof. Theorem 2.3 and the form of p FL (5, p) imply a similar bound to the above with a modified 
dependence on e. For any ce > 0, with n/N ^ 5 e (0,1) and i;/n-^/)G (0,1/2], the probability, on 
the draw of A from the Gaussian ensemble, that 



(3.4) 



p tL (k,n,N) < 



1 + V2 2p) 



ce 



4 \1- £(5,2p) - ce 



is satisfied converges to one exponentially with n. Since U(5, p) is non-decreasing in p and C(5, p) is 
strictly increasing in p for any 8 and p G (0, 1), it follows that the right-hand side of (3.4) can be 
bounded by the right-hand side of (3.3) for any fixed e satisfying < e < 



j_ 

2p 



1, by setting 



c :- 



p d£(5, z) 
2 dz 



>0. 



z=2(l+e)p 



(The upper bound on e is imposed so that the second argument of U(5,-) and £(£, •), 2(1 + e)p, is in 
the admissible range of (0, 1).) That the bound (3.3) is satisfied for all e > sufficiently small, and 
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0.2 0.3 0.4 0.5 0.6 0.7 0.8 



Fig. 3.1. Left panel: Thre e lo wer bounds on the Stron g Eq uivj A, I 1 ) phase transition, p .g(5), for Gaussian 
random matrices from Theorem 3.7 (pg(S), black), Theorem 3.9 (p^ v (5), blue), and Theorem \3.5 \ (pg L (5), red). 
Right panel: The inverse of the Strong Equiv (A, i 1 ) phase transition lower bounds in the left panel. 



that the right hand side of (3.3) is strictly increasing in e establishes that (3.3) is in fact satisfied 
probability on the draw of A that converges to one exponentially in n for any e G ^0,^ — 1^. □ 

Let p e (S) be the solution of u FL (5, (1 
implies that {i FL (k,n, N) < 1, which by Theorem 



Pro of. [ Theorem 



Lemma 



3.6 



e)p) = 1. Then, for any p < p e (S), 



3.3 



ensures Strong Equiv (A, i 1 ) . To 

remove the dependence on the level curve p € (5), note that p e (S) is related to p FL (5), the solution 
of u FL (5,p) = 1, by (1 + e)p e (5) = pf L (£). Since (1 - e) < (1 + e)" 1 for all e > 0, we have 
(1 
□ 



P F s L (S). 

e)p FL (S) < p e (5). Thus, provided p < (1 - 



e)p FL (5), the statement of Theorem 



3.5 



is satisfied. 



3.2. Comparison of bounds on StrongEquiv(A^£ 1 ). In this section we use the phase tran- 
sition framework to readily compare bounds on the region of Strong Equiv (A, i 1 ) obtained from 
vastly different methods of analysis. In Section [XT] we have already determined the region of strong 
equivalence for i 1 -minimization obtained by using the RIP. Here we look at two other examples, 
namely Donoho's polytope results [I6j [18] and the sufficient condition of Rudelson and Vershynin 
[46] obtained from geometric functional analysis. We do not go into great details about how the 
results were obtained, but simply point out that the methods of analysis are rather different. As 
a result, the original statements of the theorems take drastically different forms and are therefore 
difficult to compare even qualitatively. Translating the results into the phase transition framework, 
however, offers a direct, quantitative, and simple method of comparison. 

Using polytope theory and the notion of central-neighborliness, Donoho [16] defined a function 
Pg(S) which defines a region of the (5, p) phase space ensuring StrongEquiv(A, i 1 ) with probability 
on the draw of A converging to one exponentially in n. The phase transition function p® (5) is 



displayed as the black curve in Figure 3.1 



Theorem 3.7 (Donoho [16]). Fix e > 0. Consider the proportional- growth asymptotic, Defir 



tion 



1.2, with parameters (5, p) G (0, 1) x (0, 1/2). Sample each n x N matrix A from the Gaussian 
ensemble. Suppose p < (1 — e)p^(S). Then Prob(StrongEquiv(A, i 1 )) 1 as n — » oo. 

Therefore p$(S) bounds a region of strong equivalence for I 1 -minimization. 

Rudelson and Vershynin [46] used an alternative geometric approach from geometric func- 
tional analysis (GFA) to determine regions of Strong Equiv (A, i 1 ) for Gaussian and random partial 
Fourier matrices. For Gaussian matrices their elegantly simple proof involves employing Gordon's 
"escape through the mesh theorem" on the nullspace of A. Their lower bound on the region of 
Strong Equiv(A, i 1 ) is larger for the Gaussian ensemble than for the Fourier ensemble. We restate 
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their condition for the Gaussian ensemble in the proportional growth asymptotic. 
Definition 3.8 (GFA Region of StrongEquiv(A,i 1 )). Define 

( « ( \og{l + 2log{e/ P 5)) \ 

7(M):=6XP l 41og(e/ M ) J' 



(3.5) 



.RV 



(M :=p(l2 + 81og(l/ Pl 5)-7 2 (M)), 



3.1 



and p§ v (S) as the solution to p RV (5,p) = 1. 

The function p§ v (S) is displayed as the blue curve in Figure 

Theorem 3.9 (Rudelson and Vershynin [46]). Fix e > 0. Consider the proportional- growth 
asymptotic, Definitional^ with parameters (S,p) G (0, 1) x (0, 1/2). Sample each n x N matrix A 
from the Gaussian ensemble. Suppose p < (1 — e)p§ v (6). Then Prob( Strong Equiv( A, i 1 )) —> 1 as 
n — > oo. 

Therefore p§ v (S) bounds a region of strong equivalence for i 1 -minimization. 

Versions of Theorems 3.7 and 3.9 exist for finite values of (fc, n, N), [23] [46], but in each case the 
recoverability conditions rapidly approach the stated asymptotic limiting functions ps(S) as (fc, n, TV) 
grow; we do not further complicate the discussion with their rates of convergence. 



Since Theorems 3.5, 3.7, and 3.9 provide a region of Strong Eq uiv (A, I 1 ), w e no w have three 



and 



3.9 



each have the 



subsets of the exact region of Strong Equiv(A, i 1 ). Although Theorems 3.5, 3.7 
same goal of quantifying the exact boundary of Strong Equiv(A^£ 1 ) for Gaussian random matrices, 
they are arrived at using substantially different methods of analysis. The efficacy of the bounds 
from the largest region of Strong Equiv(A,£ 1 ) to the smallest region are p$(S) of Dono ho, p RV (5) 
of Rudelson and Vershynin, and pf L (£) of Foucart and Lai, see the left panel of Figure 3.1 From 
the inverse of ps(5), see the right panel of Figure 3.1, we can read the constant of proportionality 



where the associated method of analysis guarantees StrongEquiv(A, i 1 ): from Theorems 



3.7 



3.9 



and 3.3 they are bounded below by: n > 5.9/c, n > 56/c, and n > 317/c respectively. 



3.3. Further Considerations. The phase transition framework can also be used to quantify 
what has been proven about an encoder/decoder pair's speed of convergence, its degree of robustness 
to noise, and to make comparisons of these properties between different algorithms. A general frame- 
work for expressing the results of RIP based analyses as statements in the phase transition framework 
is presented in [3], where it is also applied to three exemplar greedy algorithms CoSaMP [43J, Sub- 
space Pursuit [15] , and Iterated Hard Thresholding [4 . Bounds on regions of Strong Equiv (A, £ q ) 
for ^-minimization for q G (0, 1] implied by the RIP are available in Section |4j where the effects of 
noise are also considered. Through these "objective" measures of comparison we hope to make clear 
the proven efficacy of sparse approximation algorithms and allow for their transparent comparison. 

In this article, we have considered only the case of noiseless measurements, Regions of Strong 
Equivalence, and a particular result obtained via an eigenvalue analysis and the RIP. We briefly 
discuss some additional considerations for the phase transition framework. 

3.3.1. Phase Transitions with Noisy Measurements. In a practical setting, it is more 
reasonable to assume that the measurements are corrupted by noise, y = Ax-\-e for some noise vector 
e. The RIP has played a vital role in establishing stable signal recovery in the presence of noise 
for many decoders. When noise is present, the curves ps(S) bounding regions of strong equivalence 
serve as an upper bound to the curves depicting the regions of the phase plane which guarantee 
stable recovery. The RIP constants also describe how significantly the noise will be amplified by the 
encoder/decoder pairing, details are available for the Gaussian encoder and ^-minimization decoder 
[1] and greedy decoders [3] CoSaMP, Subspace Pursuit, and Iterated Hard Thresholding. Hassibi 
and Xu have developed a stability analysis of i 1 -minimization from the analysis of convex polytopes 
[53] . establishing substantially larger stability regions than the regions implied by the RIP. 
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Fig. 3.2. Example Improvements on boun ds on the StrongEquiv(A, i 1 ) phase transition, ps(8), f or Gaussian 



random matrices: (pQ(8), green), Theorem 3.5 (pg L (5), red), and (p e J np (S), black). 



3.3.2. Regions of Weak Equivalence and Average Case Performance. In many ap- 
plications, it may not be imperative that the decoder is able to reconstruct every ^-sparse vector. 
Instead, one may be willing to lose a small fraction of all possible /c-sparse signals. This is the 
behavior observed when a decoder is tested on /c-sparse vectors whose support sets are drawn uni- 
formly at random. Large scale empirical testing of CoSaMP, Subspace Pursuit and Iterated Hard 
Thresholding were compiled by Donoho and Maleki [19 . Most sparse approximation algorithms do 
not have a theoretical average case analysis. The poly tope analysis of Donoho and Tanner allows for 
analytical arguments providing a Region of Weak Equivalence where recovery is guaranteed for all 
but a but a small fraction of ^-sparse signals. An average case variant of the RIP is being developed, 
see [49] . 



3.3.3. Improving the RIP Phase Transition. It is possible that Thm. 3.3 could be im- 
proved with alternative methods of analysis. For example, Thm. |3.3| built off the work of Candes, 
Romberg, and Tao [7j[9j[T0] . In [7], Candes proved that if R(2k, n, N) < y/2—1, then ^-minimization 
will successfully recover every /c-sparse vect or. An asymmetric analysis and translation into the 
Strong Equivalence terminology of Sec. 3.1 produces a function p$(5) which bounds a region of 
strong equivalence. The alternative methods of Foucart and Lai leading to Thm. |3.3| provided a 
larger region of strong equivalence. See Fig. |3.2| 

Alternatively, the region of strong equivalence might be increased by improving the bounds on 
the RIP constants, C(5, p),U(5, p). If the method of analysis remained the same, we can explore the 
effects of improved bounds by examining the statements with empirically observed lower bounds on 
RIP constants for Gaussian matrices. As detailed in Table 12.11 the current bounds from Thm. 12.31 



are no more than twice the empirical RIP constants. Replacing the RIP constants with empirically 
observed lower bounds of the RIP constants (for n = 800) in p FL (k,n, N) gives us a function 
, which is an upper bound on the region of strong equivalence implied by 
this improvement is no more than 2.5 times pf L (£) for 5 G [1/20,20/21]. 



p e ™ p ( 5), s ee Figure 
Thm. 



3.3 



4. ^-regularization Phase Transitions for q G (0, 1] Implied by RIP Constants. Fou- 
cart and Lai improved on the previously best known RIP bounds of Candes (q = 1) and Chartrand 
(q G (0, 1)) [7j [T2] for ^-regularization. Theorem 3.3 is the simplest case of Foucart and Lai's results, 
for ^ 1 -regularization and x exactly k sparse. More generally, they considered the family of xq which 
satisfy a scaled approximate fit to 6, 



(4.1) 



\\Ax e -b\\ 2 <(l + U(2k,n,N))-e for 6 > 0. 
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Letting Xg be the argmin for the ^'-regularized constrained problem 



(4.2) 



min \\z\\ q subject to \\Az - b\\ 2 < (1 + U(2k, n, N)) • 0, 



Foucart and Lai bounded the discrepancy between x@ and any xq satisfying (4.1) in terms of the 
discrepancy between xq and its best k sparse approximation, 



(4.3) 



<?k(xo)q : = 



inf \\x e 

*Lo<fc 



Theorem 4.1 (Foucart and Lai [32]). Given q £ (0, 1], for any matrix A of size n x N with 
n < N and with RIP constants L(2k,n, N) and U{2k, n, N) and /z(2fc, n, N) defined as (3.1), if 



(4.4) 



a 



1/2-1/9 /i(2fca, n,N)<l for any 1 < a < 



n 

2k' 



then a solution Xq of (4-2) approximates any xq satisfying (4.1) within the bounds 



(4.5) 
(4.6) 



\\xe ~ 4\U < Ci ■ o-k{xe) q + D 1 ■ k 1 '^ 1 ' 2 ■ 
\\x e - x* e \\ 2 < C 2 ■ a k (x e ) q ■ (ah) 1 ' 2 - 1 '" + D 2 



with C\,C 2 ,D\, and D 2 functions of q, a, and ^Z^^riW) • 

The parameter a in Theorem |4.1|is a free parameter from the method of proof, and should be 



selected so as to maximize the region where (4.4) and/or other conditions are satisfied. For brevity 
we do not state the formulae for 61,62, -Di, and D 2 as functions of (/c,n,iV), but only state them 
in terms of their bounds for Gaussian random matrices as (fc,n, N) —> 00. 

ensures that if there 



in Theorem 



4.2 



4.1 



Although the solution of (4.2), x~q, has unknown sparsity, Theorem 
is a solution of (4.1), xq, which can be well approximated by a A: sparse vector, i.e. if crk(xo) q 
is small, then if (4.4) is satisfied the discrepancy between x@ and xq will be similarly small. For 
instance, if the sparsest solution of (4.2), xq, is k sparse, then ( |4.6[ ) implies that \\xq — x* e \\ 2 < D 2 -0; 
moreover, if = then Xq will be k sparse and satisfy Axq = b (in the case q = 1 this result is 



summarized as Theorem 3.3). Substituting bounds on the asymmetric RIP constants L(2ak, n, N) 
and U (2ak, n, N) from Theorem |2.3| we arrive at a quantitative version of Theorem |4.1| for Gaussian 
random matrices. 

Theorem 4.2. Given q £ (0, 1], for any e > 0, as (k,n,N) 00 with n/N -» S £ (0, 1) and 
k/n —> p £ (0, 1/2], if p < (1 — e)pg L (5', q) where pg L (S; q) is the maximum over 1 < a < l/2p of the 
solutions, p(5;q,a), of p a (5,2ap) := a 1 / 2_1 / g, /i((5, 2ap) = 1 with p(5,2p) defined as in \3.2\, there 



is overwhelming probability on the draw of A with Gaussian i.i.d. entries that a solution x@ of (4-2) 
approximates any xq satisfying \4-ty within the bounds 

(4.7) 
(4.8) 



\\x e - x* e \\ g < C^ap) ■ <T k (x e )q + D 1 (5,2ap) ■ k 1 '^ 1 ' 2 ■ 
\\x6-x* e \\ 2 <C 2 (S,2ap)-a k (x 9 ) q -(ak) 1 ^- 1 ^ + D 2 {5,2ap) 



The multiplicative "stability factors" are defined as: 
2 2 ^- 1 {l + p a {5,2apY) 1 li 



d(S,2ap) :-- 
C 2 (S,2ap) :-- 



(1 - p a {5,2ap)iy/i ' 

2 2 /"- 2 (/?(£, 2ap) + l- y/2) 
(l-(i a (5,2ap)iy/i ' 



Di(6,2ap) :-- 



2 2 /"- 1 /3(5,2ap) 
(l-/i a (<J,2ap)«)V9 



(4.9) D 2 (S, 2ap) .= + 2/ 3(f, 2ap) 



with (3(5, p) := (1 + V2)\ 



l+U(S,p) 
-AM ■ 




(c) (d) 

Fig. 4.1. The surface whose level curves specify p$ L (5; q,C\(5, p) < T) for q = 1 and q = 1/2 in Panels (b) 
and (d) respectively, with level curves for specific values of T shown in Panels (a) and (c) for q = 1 and q = 1/2 
respectively 



Unlike Theorem 3.5 which specifies on e fun ction p$ L (S) which bounds from below the phase 
transition for Strong Equiv (A, i 1 ), Theorem 



4.2 



specifies a multiparameter family of threshold func- 
tions depending on q and possibly with further dependence on bounds on the multiplicative stability 
factors, such as C±(5, p). The function p FL (S) in Theorem 
and no bounds on the stability parameters. The function p$ 



0,<z=l, 
corresponds 



) corresponds to the c ase 
^ FL (S;q) in Theorem 

to £ q regularization with unbounded stability coefficients, and as a result, it is only meaningful for 
p strictly below p$ L (5',q) or in the case where there exists a k sparse solution to Axq = b and 
= 0. More generally, specifying a bound on one or more of the multiplicative stability factors 
determines functions p$ L (5; q, Cond). For instance, imposing a b ound of T on the stability factor 

shows pg L (5; q,Ci(5, p) < T) for 



4.1 



Ci(d,p) generates a function pg L (5; q,Ci(5, p) < T); Figure 
q = 1 and q = 1/2 in panels (a-b) and (c-d) respectively. Software is available upon request which 
will generate functions with these and other choices of parameters in Theorem |4.2| 

4.1. Discussion. The lower bound on the Strong Equiv (A, i 1 ) phase transition implied by the 
RIP for strictly k sparse signals, p FL ($) of Theorem 3.3 , does not have any implied stability. In 
order to ensure stability, Theorem |3.3| requires further restrictive bounds on the stability factors in 
further reducing the lower bound on the phase transition. For example, Ci(S,p) for 
(b), with level curves of Ci(5, p) corresponding to fixed stability factors 



4.1 



Theorem 
q = 1 is shown in Figure 



4.1 



proceeding (4.3) in Theorem |4.1| ph ase transitions below which specified bounds on C\{8,p) can 
be ensured are shown in Figure |4TT| (a-b). The stability factor becomes unbounded at finite p as 
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pU F s L {5). 

A current trend in sparse approximation is to consider i^-regularization for q G (0, 1), with the 
aim of increasing the recoverability region p~2j H3]. Existing results have shown that indeed the 
region where ^-regularization successfully recovers k sparse vectors at least does not decrease as q 
decreases [35], though little is known as to the rate, if any, at which it increases. Theorem 4.1 gives 
lower bounds on these regions where i^-regularization is guaranteed to have specified recoverability 
properties, and in fact for any strictly k sparse vector it implies that if (|3.1|) is finite, there is a small 



enough q such every k sparse vector can be recovered exactly from (6, A) by solving (4.2) with 6 = 0. 
Despite this and other encouraging results, many fundamental questions about ^-regularization 
remain, in particular how to find the global minimizer of (|4.2|). Moreover, it is unknown if t q - 



regularization remains stable as q decreases. In order to ensure stability, Theorem |3.3| requires 
further restrictive bound on the stability factor in Theorem |4.1[ further reducing the lower bound 
on the phase transition. For example, C\{8,p) for q = 1/2 is shown in Figure [XT] (d), with level 
curves of C\{8,p) corresponding to fixed stability factors preceeding (4.3) in Theorem |4.1| phase 
transitions below which specified bounds on C\{8,p) can be ensured are shown in FigurejXl] (c-d). 



Decreasing q from 1 to 1/2 does increase the value of p at which the stability factors in Theorem 4.1 
become unbounded; however, comparing Figure |4.1| (d) and (b) it is apparent that this elevating of 
the unstable phase transition comes at the price of also elevating C\ (J, p) for small values of p. In 
particular, the region where C\(8,p) < 50 is, in fact, larger for q = 1 than for q = 1/2. 
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